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One of the most basic properties of the communicative sign is its dual nature. That is, a sign is a 
twofold entity composed of a formal component, which we call signal, and a referential component, 
namely a reference. Based on this conception, we say that a referent is coded in a particular sign, 
or that a sign is decoded in a particular referent. In selective scenarios it is crucial for the success 
of any adaptive innovation or communicative exchange that, if a particular referent a is coded in a 
particular signal s during the coding process, then the referent a is decoded from the sign s during 
the decoding process. In other words the referentiality of a signal must be preserved after being 
decoded, due to a selective pressure. Despite the information-theoretic flavour of this requirement, 
an inquiry into classical concepts of information theory such as entropy or mutual information 
will lead us to the conclusion that information theory as usually stated does not account for this 
very important requirement that natural communication systems must satisfy. Motivated by the 
relevance of the preservation of referentiality in evolution, we will fill this gap from a theoretical 
viewpoint, by deriving the consistent information conveyed from an arbitrary coding agent A" to 
an arbitrary decoding agent A u and discussing several of its interesting properties. 
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I. INTRODUCTION 

Biological Systems store and process information at 
many different scales ( Yockey 1992 ) . Organisms or cells 



react to changes in the external environment by gathering 
information and making the right decisions -once such 
information is properly interpreted. In a way, we can 
identify the external changes as input signals to be coded 
and decoded by the cellullar machinery or information 
processing of neural networks, and include the exchange 
of signals between individuals or abstract agents sharing 
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The ability to store information to interpret the sur- 
roundings beyond pure noise is thus an important prop- 
erty of biological systems. An organism or abstract agent 
can make use of this feature to react to the environment 
in a selectively advantageous way. This is possible pro- 
vided that, in biological systems, a communicative signal 
must be necessarily linked to a referential value, that is, 
it must have a meaningful content. As pointed out by 
John Hopfield: 



Meaningful content, as distinct from noise 
entropy, can be distinguished by the fact that 
a change in a meaningful bit will have an ef- 
fect on the macroscopic behavior of a system 
fflopneldl[l994|. 



The meaningful content of information can be under- 
stood as something additional to classical information 
which is preserved through generations (or by the mem- 



bers of a given population in a given communicative ex- 
change) resulting in a consistent response to the environ- 
ment (iHakenl 119781). 



The explicit incorporation of the referential value in 
the information content is, in some sense, external to 
classical information theory, since, roughly speaking, 
the standard measure of mutual information only ac- 
counts for the relevance of correlations among sets of ran- 
dom variables. Indeed, one can establish configurations 
among coder and decoder by which mutual information 
is maximal but the referentiality value of the signal is 
lost during the communicative exchange. Let us con- 
sider the following example: Suppose a system where the 
event fire is coded as the signal a, and that such a signal 
a is always decoded as the event water. Suppose, also, 
that the event water is coded as the signal b and it is 
always decoded as fire. In this system, both the coder 
and the decoder depict a one-to-one mapping between 
input and output, and the mutual information between 
the set of events shared by coder and decoder would be 
maximum. However, if we take the system as a whole, 
the non-preservation of any referential value renders the 
communication code useless. 

Not surprisingly, evolutionary experiments involving 
artificial agents (such as robots) include, as part of the 
selective pressures, the consistency of signals and refer- 
ents. If survival or higher scores depend on a fitness 
measure which requires a proper sharing of information, 
the final outcome of the dynamics is a set of agents using 
common signals to refer to the same objec t (|Nolfi and| 



Mirolli 2010 Steels 2001 Steels and Baillie 2003). For- 



mally, we say that the communicative sign has a dual 
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nature : a sign would involve a pair 

(rrii,Sk), 



(1) 



composed of a signal, Sj, and a referent, m^. Such pair 
must be conserved in a consistent communicative inter- 
change. 

The problem of consistency of the communicative pro- 
cess was early addressed in (Hurford 1989), through a 



formalism consisting in signal/referent matrices. Further 
works showed the suitability of such formalism, and en- 
abled the study of the emergence of consensus driven 
by selective forces (Nowak and Krakauer 1999). These 



studies showed that an evolutionary process could result 
in a shared code by a population of interacting agents. 
Under this framework, the existence of optimal solutions 
has been studied ( Komarova and Niyogi , 2004 ) , as well as 



the problem of the information catastrophe or linguistic 
error limit ( Nowak 2000 ) , using evolutionary game the- 



ory involving a payoff function accounting for the average 
number of well-referentiated signals. 

It is the purpose of this theoretical work to rigorously 
identify the amount of information which conserves the 
dual structure of a sign, i.e., the amount of consistent 
information, and to explore some of its consequences. 
Specifically, we evaluate the relevance of the consistent 
input/output pairs, assuming that the input set and the 
output set are equal. The study of the behaviour of 
the consistent information displays interesting differences 
with classical Shannon's mutual information. 

We should properly differentiate the problem of consis- 
tency from the problem of absolute information content 
of a given signal -or, in general, mathematical object. 
The latter arises from the fact that, in Shannon's infor- 
mation theory, the information content of a given signal 
is computed from the relative abundance of such a signal 
against the occurrences of the whole set of signals. The 
information content of an isolated signal is not defined 
(or equal to zero). This is solved by the definition of 
the Kolmogorov Complexity (|Cover and Thomas |1991 



Kolmogorov 1965 



Ming and Vitanyi 1997p , which can 



be understood as the absolute information content of a 
given signal -or mathematical object. Our purpose can 
be embedded in Shannon's framework. Accepting the 
relative nature of the information content, we attack the 
problem of the consistency of input/output pairs. 

The paper is written in a self-contained way. Thus, 
beyond basics of probability theory we properly introduce 



1 This central property of the communicative sign resembles the 
duality of the linguistic sign pointed out by first time by the Swiss 
linguist Ferdinand de Saussure (Saussure f9f6). According to 
Saussure, a linguistic sign is a psychical unit with two faces: a sig- 
nifier and a signified. The former term is close to our term 'signal' 
and the latter to our term 'reference'. There are, though, impor- 
tant differences between the information-theoretical approach we 
are about to develop and Saussure's conception of the linguistic 
sign. 



the concepts and the required mathematical apparatus. 
At the end of the paper, a case study (the classical binary 
symmetric channel) is described in detail. 



II. THE MINIMAL SYSTEM AND ITS ASSOCIATED 
INFORMATION MEASURES 

In this section we define the minimal system composed 
of two agents able to both code and decode a set of ex- 
ternal events. 



A. The communicative system 

Consider a set of (at least, two) interacting agents 



living in a shared world (Komarova and Niyogi 2004). 



Agents communicatively interact through noisy channels. 
The description of this system is based on the probabil- 
ity transition matrices defining the coding and decoding 
processes, the probability transition matrix for the chan- 
nel and the random variables associated to the inputs 
and outputs, which account for the successive informa- 
tion processing through the system formed by two agents 
and the noisy channel -see fig. 1. The qualitative differ- 
ence with respect to the classical communication scheme 
is that we take into account the particular value of the in- 
put and the output therby capturing the referential value 
of the communicative exchange. An agent, A v , is defined 
as a pair of computing devices, 



A v = {P V ,Q V }, 



(2) 



where P" is the coder module and is the decoder mod- 
ule. The shared world is defined by a random variable 
Xq which takes values on the set of events, f2: 



SI = {mi, ...,m n }, 



(3) 



being the (always non-zero) probability associated to any 
event € Q defined by /i(m^). The coder module, P", 
is described by a mapping from SI to the set 5: 



S — {si, s n }, 



(4) 



to be identified as the set of signals. For simplicity, here 
we assume |f2| = \S\ = n. This mapping is realized ac- 
cording to the following matrix of transition probabilities: 



P& = P v ( Sj \mi), 
which satisfies the following condition: 



(Vm^Sl) Y, P ij 

j<n 



(5) 



(6) 



The output of the coding process is described by the ran- 
dom variable X s , taking values on S according to the 
probability distribution v: 



"( s ») = ^2 f-( m k)P v ki - 

k<n 



(7) 
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FIG. 1 The minimal communicative system to study the con- 
servation of referentiality (a): A shared world, whose events 
are the members of the set Q and whose behavior is governed 
by the random variable Xq. A coding engine, P", which per- 
forms a mapping between Q and the set of signals S, being 
X s the random variable describing the behavior of the set 
of signals obtained after coding. The channel, A, may be 
noisy and, thus, the input of the decoding device, Q", de- 
picted by X' a , might be different from X s . Q" performs a 
mapping among 5 and Q whose output is described by Xq. 
Whereas the mutual information provides us a measure of the 
relevance of the correlations among Xq and Xq, the consis- 
tent information evaluates the relevance of the information 
provided by consistent pairs on the overall amount of infor- 
mation. In this context, from a pure information-theoretical 
point of view, situations like b) and c) could be indistinguish- 
able. By defining the so-called consistent information we can 
properly differentiate b) and c) by evaluating the degree of 
consistency of input/output pairs -see text. 



The channel, A, is characterized by the n x n matrix 
of conditional probabilities Pa(«S|<5>), i.e., 



Ajj = Pa(«j|Si). 



(8) 



The output of the composite system coder+channel, 
P U A, is described by the random variable X' s , which takes 
values on the set S following the probability distribution 
v 1 , defined as: 



where 



^'( s i) = y^^(m k )P vA (si\m k ), 



j<n 



(9) 



(10) 



Finally, the decoder module is a computational device 
described by a mapping from S to fl, i.e it receives S 



as the input set, emitted by another agent through the 
channel, and yields as output elements of the set Cl. Q" 
is completely defined by its transition probabilities, i.e.: 



Q v ik = P v (m k \ Sl ), 
which satisfies the following condition: 

(V Sl eS) 53 Q? fc = 1. 



(11) 



(12) 



k<n 



Aditionally, we can impose another condition: 

(Vmj e n) 53 p *N m j) = !' (1 3 ) 



which is necessary for A v to reconstruct Q, i.e., if the 
population of interacting agents share the world. By im- 
posing condition (13) we avoid configurations in which 



some m k S Cl cannot be referentiated by the decoder 
agent. We notice that it is consistent with the fact that 
no element from f2 has zero probability to occur. Fur- 
thermore, we emphasize the assumption that, in a given 
agent A", followin g ( Nowak and Krakauer| |1999[ Plotkin 
and Nowak 2000) but not (Hurford 1989; Komarova and 



Niyogi 2004) there is a priori no correlation between P" 



and Q". Finally, under the presence of another agent 
A u , we can define the output of Q" as the random vari- 
able Xq, taking values on the set Cl and following the 
probability distribution fjf, which takes the form: 



Kn 



„(mi\mi), (14) 



where 



PAv^Av(rn,i\mi) 



PA«-M.«(»™t> TTlj) = 



j,r<n 



Lr 



Consistently, 



(15) 
(16) 

(17) 



:,i<7 



Once we have the description of the different pieces of 
the problem, we proceed to study the couplings among 
them in order to obtain a suitable measure of the con- 
sistency of the communicative process. The first natu- 
ral quantitative observable to account for the degree of 
consistency is the fraction of events rrii G CI which are 
consistently decoded. From eq. <fT6j) it is straigtforward 
to conclude that such a fraction (F(A V — > A u )) is given 
by: 

F{A V ^A u ) = Y J ^ > A^A^m i ,m i ). (18) 

i<7i 
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And if we take into account that the communicative ex- 
change takes place in both directions, we have: 

F(A V ,A U ) = - (F(A V -> A") + F(A U -> A")) . (19) 



Putting aside slight variations, eq. ( 19 1 has been widely 



used as a payoff function to study the emergence of con- 
sistent codes -in terms of duality preservation- through 
an evolutionary process involving several agents in every 
generation (Hurford 1989| |Komarova and Niyogi, 2004; 

1999| |Plotkin and Nowak[ |2000[ )' 



Nowak and Krakauer 



Such an evolutionary dynamics yielded important results 
which help understanding how selective pressures push a 
population of communicating agents to reach a consensus 
in their internal codes. 



B. Mutual Information 

Now we proceed to compute the mutual information 
among relevant variables of the system. We stress that 
it does not account for the referentiality of the sent sig- 
nals. Instead, it quantifies, in bits, the relevance of the 
correlations among two random variables, as a potential 
message conveyer system, never specifying the referential 
value of any sequence or signal. 

Let us briefly review some fundamental definitions and 
concepts of information theory. We know that, given two 
random variables A, Y, with associated probability func- 
tions p(x),p(y), conditional probabilities P(x\y),P(y\x) 
and joint probabilities P(x,y), its mutual information 
I(X : Y) is defined a s flAsh| [l990| |Cover and Thomas) 



1991 Shannon, 1948): 



I{X:Y) = log 



x.y 



P(x,y) 

p(x)p(y) '' 



or equivalently: 

I(X:Y)=H(X)-H(X\Y), 



(20) 



(21) 



being H(X) the Shannon entropy or uncertainty associ- 
ated to the random variable X: 



H(X) = logp(a 



(22) 



and H(X\Y) the conditional entropy or conditional un- 
certainty associated to the random variable X with re- 
spect to the random variable Y: 

H{X\Y) = - £>(y) E P (^) lo § ( 23 ) 

V x 

We can also define the joint entropy among two random 
variables X,Y, written as H(X,Y): 

H(X,Y) = -J2P(x,y)logP(x,y). (24) 



A key concept of information theory is the so-called chan- 
nel capacity, C(A), which, roughly speaking, is the max- 
imum amount of bits that can be reliably processed by 
the system, namely: 



C(A) = max /(A : Y). 

p(x) 



(25) 



As usual, in our minimal system of two interacting agents 
we explicitely introduced the channel, A, as a matrix of 
transition probabilities between the two agents. Chan- 
nel capacity is an intrinsic feature of the channel; as the 



fundamental theorem of information theory (Ash 1990 



Cover and Thomas 1991 Shannon 1948) states, it is 



possible to send any message of R bits through the chan- 
nel with an arbitrary small probability of error if: 



R < C(A); 



(26) 



otherwise, the probability of errors in transmission is no 
longer negligible. One should not confuse the statements 
concerning the capacity of the channel with the fact that 
given a random variable with associated probability dis- 
tribution p[x), we have: 



max I(X : Y) = H(X) = H(Y) 



(27) 



(provided that C(A) > H(X)). In those cases, we refer 
to the channel as noiseless. 



Let us now return to our system. Using eq. (20) and 
the joint probabilities derived in eq. (16), we can com- 



pute the mutual information among An and X' n when A v 
is the coder and A" the decoder, to be noted I(A V — > A"), 
as follows: 



I(A V ^A U ) = J2 P A^AAm l ,m l )x 



x log 



(28) 



Notice that, since the coding and decoding modules of a 
given agent are depicted by different, a priori non-related 
matrices, in general 



I(A V -> A u ) ^ I(A U -> A v 



(29) 



The average of shared information among agent A" and 
A u will be: 

(I(A V , A u )) = \{I{A V -> A u ) + I{A U -> A v )). (30) 

Clearly, since the channel is the same in both directions 
of the communicative exchange, the following inequality 
holds: 



(I(A V ,A U )) < C(A). 



(31) 



In the next section we investigate the role of the wen- 
correlated pairs and its impact in the overall quantity of 
information. 
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III. CONSISTENT INFORMATION 

To obtain the amount of consistent information shared 
among A u and A v , we must find a special type of correla- 
tions among Xq and X'q. Specifically, we are concerned 
with the observations of both coder and decoder such 
that the input and the output are the same element, i.e., 
the fraction of information that can be extracted from 
the observation of all consistent pairs V A v ^A u {n^ii m i)- 
This fraction is captured by the so-called referential pa- 
rameter, and its derivation is the objective of the next 
subsection. 



A. The Referential parameter 

The mutual information among two random variables 
is obtained by exploring the behavior of input/output 
pairs, averaging the logarithm of the relation among the 
actual probability to find a given pair and the one ex- 
pected by chance. Consistently, the referential parame- 
ter is thus obtained by averaging the fraction of informa- 
tion that can be extracted by observing consistent pairs 
against the whole information we can obtain by looking 
at all possible ones. 



1. Derivation of the Referential Parameter a 

Following the standard definitions of the information 
conveyed by a signal (Shannon 1948), the information 



we extract from the observation of a pair input-output 
mi, mi is: 



logP^_>.A«(rai,TOj 



(32) 



Following eq. ( 24 ) , the average of information obtained 



from the observation of pairs will be precisely the joint 
entropy between Xq and X'q, H(Xq, X' n ): 

- ^2 p A-^A"(mi, mj)logPA^-^A"(rni,mj). 

i,j<n 

Let us simplify the notation by defining a matrix J. The 
elements of such a matrix are the joint probabilities, 
namely: 



Jij = PA»-^A«(m»,mj). 



(33) 



From the above matrix, we can identify the contribu- 
tions of the consistent pairs by looking at the elements of 
the diagonal. The relative impact of consistent pairs on 
the overall measure of information will define the referen- 
tial parameter associated to the communicative exchange 
A" — > A u , to be indicated as o~a^^a u - This is our key 
definition, and its explicit form will be: 



ga-o- 



tr(JlogJ) 

H(X n ,X' n )' 



(34) 



where tr(Jlog J) is the trace of the matrix J log J, i.e.: 
tr(JlogJ) = ^2 Jiilog J lt . (35) 

By dividing tr(J) by H(Xq, X'q) we capture the fraction 
of bits obtained from the observation of consistent pairs 
against all possible pairs (mi,mj) 2 . 

The amount of Consistent Information, T{A V — » A u ), 
is obtained by weighting the overall mutual information 
with the referential parameter: 



1(A V -> A u ) = I(A V -> A u )a A v^ A -- 



(36) 



The average of consistent information among two agents, 
F(A V ,A U ) will be, consistently: 



A u ) +1(A V -> A u )) . (37) 



Since a a*-* a* € [0, 1], from the defintion of channel ca- 
pacity and the symmetry properties of the mutual infor- 
mation, it is straightforward to show that: 

F{A V ,A U ) < (I(A V ,A U )) <C(A). 



Eqs. ( 34 [36 ) and ( 37 1 are the central equations of this 
paper. Let us focus on eq. ( |36[ ). In this equation, we 
derive the average of consistent bits in a minimal sys- 
tem consisting of two agents (coder/decoder). Consis- 
tent information has been obtained by mathematically 
inserting the dual nature of the communicative sign - 
which forces the explicit presence of coder, channel and 
decoder modules- and subsequently selecting the subset 
of correlations by which the input symbol (the specific 
realization of Xq) is equal to the output symbol (i.e., 
the specific realization of Xq). Eq. ( p7| accounts for 
the (possibly) symmetrical nature of the communicative 
exchange among agents: a priori, all agents can be both 
coder and decoder, and we have to evaluate and aver- 
age the two possible configurations. The information- 
theoretic flavour of J- enables us to study the conserva- 
tion of referentiality from the well-grounded framework 
of Information Theory. 



B. General Behavior of Consistent Information 

So far we have been concerned with the derivation of 
the amount of information which is consistently decoded, 



2 We might notice that the amount of information carried by con- 
sistent pairs resembles the formal exposition of the Von Neumann 
entropy for quantum states, S(p), which captures the degree of 
mixture of a given quantum state and its associated uncertainty 
in measuring ([Von Neumann 1936J. In this way, we observe that 
S can be, roughly speaking, identified with an indicator of the 
consistency of the quantum state. However, it is worth noting 
that these measures are conceptually and formally different. 
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taking into account the dual nature of the communicative 
sign -equations ( 34 1 , p6| and ( 37 1 . Now we explore some 



of its properties, and we highlight the conceptual and 
quantitative differences between I and /. 

To study the behavior of I and its relation to /, we 
will isolate the first three most salient features. Specifi- 
cally, we shall concern ourselves with the following logical 
implications: 



i) {a A v^. A v 

ii) (<7A«->-A* 



1) => (I(A V 
1) ^ (I(A V 



A u ) 
A u ) 



H(X n )), (38) 
H(X a )). (39) 



The first i) implication refers to the perfect conservation 
of rcfcrentiality, which, in turn, implies maximum mutual 
information. However, the inverse, ii), is not generally 
true, since, as we shall see, there are many situations 
by which the mutual information is maximum although 
there is no conservation of referentiality. Furthermore, 
we consider a third case, the noisy channel (which implies 
that H{X n \X' n ) > 0). In this case: 

Hi) H{X Q ) > I{A V -> A u ) > 1(A V -> A u ). (40) 

We begin with the implications i) and ii). In both 
cases, the whole process is noiseless, since from eq. (27) 
max /(A" -> A u ) = H(X n )). To address the first logical 
implication, i), we obtain the typology of configurations 
of P",A, Q v leading to o~ a v -^a u = 1- We observe that 
the condition (391 is achieved if P(Xq\Xq) = 1, i.e., the 
identity matrix: 



hj = 



1 iff i = j 
otherwise. 



Such a condition only holds if 

P" = (ACT)- 1 , 



(41) 



(42) 



since given a square matrix A, A- A -1 = 1 -provided that 
A -1 exists. From the conditions imposed over the tran- 
sition matrices provided in eqs. ( 6|12|17 ), the above re- 
lation is fullfilled if and only if all the matrices P 11 , A, Q u 
are permutation matrices. Let us briefly revise this con- 
cept, which will be useful in the following lines. A permu- 
tation matrix is a square matrix which has exactly one 
entry equal to 1 in each row and each column and 0's 
elsewhere. For example, if n — 3, we have 6 permutation 
matrices, namely: 



(43) 





(44) 



II„ X „ and, if A,B e U nxn , the product AB e II nxn . 
Furthermore, it is clear that 1 € n nx „. If we translate 
the above facts of permutation matrices to our problem, 
we find that <ja v ^A u = 1 is achieved if: 

(P°, A, CT e n nx „) and P* = (AQ") T , (45) 

leading to the following chain of equalitites, which only 
holds in this special case: 

1{A V -> A u ) = I(A V -> A u ) 

= max I(A V -> A u ) 
= H{X n ). 

Case ii) is easily demonstrated by observing that, if 
V(X u \X' n ) € n„ x „, then V{X' n \X n ) e n nx „ and thus 



H(X n \X^)=Q, 



(46) 



leading to: 



I(A V -> A u ) = m&xI{A v -> A u ) = H{X n ), (47) 
which is achieved only imposing that 

P",A,Q"en nx „. (48) 

However, as we saw above, only a special configuration of 
permutation matrices leads to (TA"-f A" = 1- Thus, for the 
majority of cases where I(A V — > A u ) = max I(A V — > A u ), 
the conservation of the referentiality fails, leading to 



I(A V -> A u ) >2{A V ~>A U ), 



(49) 



unless condition (451 is satisfied. Let us notice that 
there are limit cases where, although I(A V —> A u ) — 
max/fi" -> A u ), X{A V -> A u ) = 0, since it is possible 
to find a configuration of P",A, Q u € n nx „ such that 
P(A r fj|X^) is a permutation matrix with all zeros in the 
main diagonal, leading to oa^^a u = 0. 

Case Hi) is by far the most interesting, since natural 
systems are noisy, and the conclusion could invalidate 
some results concerning the information measures related 
to systems where referentiality is important. The first in- 
equality trivially derives from equation ( |21[ ), from which 
we conclude that I(A V -> A u ) < H(Xq). The argument 
to demonstrate the second inequality lies on the following 
implication: 



(H(X n \X' n ) > 0) => (P 



A v - 



n.. 



(50) 



Indeed, let us proceed by contradiction: Let us sup- 
pose that P a^^,a u {X'q\Xq) e n„ xn . Then, as discussed 
above, PA^^A^(Xn\X^) g n nx „. But this should imply 
that H(Xn\X' n ) = 0, thus contradicting the premise that 
H(Xn\X' n )>0 ; 

This has a direct consequence. Since such conditional 
probabilities satisfy eq. (17), then, more than n matrix 



The set of n x n permutation matrices is idicated as n nx „ elements of P j 



(XqIA 7 ^) must be different from zero. 



and it can be shown that, if A G n r , 



The same applies to the matrix of joint probabilities J 
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and thus it also applies to — J log J. Since the trace is a 
sum of n elements, it should be clear that, under noise: 



leading to: 



iJ(X n ,X^)>-tr(JlogJ), 



&A V ^A U < 1> 



(51) 



(52) 



thus recovering the chain of inequalities provided in eq. 
pi: 



H(X n ) > I(A V -> A u ) > I(A V -> A u ). 



(53) 



If we expand the reasoning to the symmetrical consistent 
information F{A V ,A U ) defined in p7b: 



T{A V ,A U ) < (I(A V ,A U )). 



(54) 



We see that referentiality conservation introduces an 
extra source of dissipation of information. In those sce- 
narios where referentiality conservation is an important 
advantage, the dissipation of information, 1^, among two 
agents has two components: 



physical noise 



Referential noise 



Id = H(X a \X' a ) + (1 - a)I(A v -> A u ) 



(55) 



being the amount of useful information provided by con- 
sistent information, namely: 



I(A V -> A u ) = H(Xq) — Id- 



(56) 



IV. CASE STUDY: THE BINARY SYMMETRIC 
CHANNEL 

As an illustration of our general formalism, let us 
consider the standard example of a binary symmetric 
channel where we have two agents, A V ,A U , sharing a 
world with two events, namely f2 = {mi, 771,2} such that 
//(mi) = fJ.(m 2 ) = 1/2. 

Case 1: Non-preservation of referentiality. We 

will consider a case where I(A V —> A u ) = max/ but 
<?A v -yA u — I{A V — > A u ) = 0. The transition matrices of 
agents A v and A u are identical and defined as: 



1 
1 



1 

1 



A''-" : : <! I' = ( ,', V I .Q' 

The channel between such agents, A, is noiseless: 
A=( 1 



(57) 



(58) 



We begin by identifying the different elements involved 
in the process. First, from eq. ( 14 ) we obtain: 



//'(mi) = /z'(m 2 ) 



The matrix of joint probabilities, J, is -see eq. (33): 







\ 



(59) 



Thus, rearranging terms, the mutual information from 
A u to A v -see (eq. (28])- will be: 



I(A V -> A u ) = log 2 = 1 bit. 

We observe that, for a communication system consisting 
of two possible signals, 



max / = log 2 = 1 bit. 



(60) 



Thus the mutual information is maximum. However, it 
is evident that such a system does not preserve referen- 
tiality, since, if Xq = mi, then X' n = to 2 j an d viceversa. 
Indeed, let us first obtain the matrix — J log J, which will 
be: 



J log J 





-I log I 



ilogi 



(61) 



And, thus, by its definition, the referential term will be 

tr(JlogJ) 



(eq. 34) 



log 2 



= 0, 



(62) 



(notice that log 2 = 1, although we keep the logarithm 
for the sake of clarity) being the amount of consistent 
information: 



I{A V A u ) = bits. 



(63) 



This extreme case dramatically illustrates the non-trivial 
relation between I and /, proposing a situation where the 
communication system is completely useless, although 
the mutual information between the random variables 
depicting the input and the output is maximum. 

Case 2: Preservation of the referentiality. In 

this configuration, the referentiality is conserved. Let us 
suppose a different configuration of the agents. Now the 
transition matrices of agents A v and A u are identical and 
defined as: 



1 
1 



1 
1 



(64) 



The channel between such agents, A, is the two- 
dimensional noiseless channel defined in eq. (58). It is 



straightforward to check that the mutual information is 
maximal (= 1 bit), as above. The matrix —J log J will 
be, now, 



J log J = 



-I log § 

4 log § 



This leads to ga^-^A u — 1, and, consequently: 
I(A V -> A u ) = I(A V -> A u ). 



(65) 



(66) 
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The above configuration is the only one which leads to 
1=1, Furthermore -as shown in section Ill.b- it can 
only be achieved when / is maximum, i.e., in a noiseless 
scenario. In the last example we will deal with a noisy 
situation. 

Case 3: Noisy channel. We finally explore the case 
where the matrix configuration of agents is the same as 
in the above example (eq. 57 1 but the channel is noisy, 
namely: 



A 



0.9 0.1 
0.1 0.9 



(67) 



We first derive the matrix of joint probabilities, J, which 
takes the following form: 



0.45 0.05 
0.05 0.45 



(68) 



We now proceed by observing that //(mi) = //(rri2) 
1/2. Thus, the mutual information will be: 



0.531... bits. 



(69) 



To evaluate the degree of consistency of the communica- 
tive system, we firstly compute the matrix —J log J: 



-0.45 log 0.45 -0.05 log 0.05 
-0.05 log 0.05 -0.45 log 0.45 



0.518 0.216 
0.216 0.518 



(70) 

Since H(Xq, X' n ) — 1.468 bits, the referential parameter 



tr(J log J) 

H(X n , X' n ) 
0.518 + 0.518 
1.468 

= 0.706... consistent bits/bit. (71) 

(where the last "bit" refers to "bit obtained from the 
observation of input-output pairs"). The consistent in- 
formation is, thus: 

X(A V -> A u ) = I(A V -i- A u )a A v^ A u 
= 0.531 x 0.706 

= 0.375 bits. (72) 

Due to the symmetry of the problem, the average among 
the two agents is: 

F(A V ,A U ) = 0.375 bits. (73) 

The amount of dissipated information is, thus: 



physical noise Referential noise 



We want to stress the following point: The matrix con- 
figuration is consistent with the framework proposed in 
case 2, where the amount of consistent information is 
maximum, but now the channel is noisy. The noisy 
channel has a double effect: first, it destroys informa- 
tion in the standard sense, since the noise parameter 
H(Xq\X^) > 0, but it also has an impact on the consis- 
tency of the process, introducing an amount of referential 
noise due to the lack of consistency derived from it. Thus, 
as derived in section Ill.b, eq. (40), in the presence of 



noise, we have shown that the inequalities 

H(X a ) > I(A V -s- A u ) > 1(A V -> A u ) (75) 
hold, being, in our special case: 

1 > 0.531 > 0.375. (76) 

V. DISCUSSION 

The accurate definition of the amount of information 
carried by consistent input / output pairs is an important 
component of information transfer in biological or artifi- 
cial communicating systems. In this paper we explore the 
central role of information exchanges in selective scenar- 
ios, highlighting the importance of the referential value 
of the communicative sign. 

The conceptual novelty surrounding the paper can be 
easily understood from the role we attribute to noise. 
Physical information considers a source of H(X) bits and 
a dissipation of H(X\Y) bits due to, for example, thermal 
fluctuations. We add another source of information dis- 
sipation: the non-consistency of the pair signal/referent, 
putting aside the degree of correlation among random 
variables (see eq. |55| ). Indeed, in many physical pro- 
cesses no referentiality is at work, perhaps because, it is 
not relevant to wonder about the consistency of the com- 
municative process. Moreover, if the whole system is de- 
signed, consistency problems are apriori ruled out, unless 
the engineer wants to explicitly introduce disturbances 
in the system. What makes biology different, however, is 
that biological systems are not designed but instead, are 
the outcomes of an evolutionary process where the nature 
of the response to a given stimulus is important, which 
makes the problem of consistency relevant for evolution- 
ary scenarios. This problem needs an explicit formula- 
tion, being what we called consistent information the the- 
oretical object that links raw information and function, 
or environmental response. 

Are information processing mechanisms of living sys- 
tems optimal regarding referentiality conservation? As 
we discussed above, it seems reasonable to assume that 
the conservation of referentiality must be at the core of 
any communicative system with some selective advan- 
tage. The general problem to find the optimal code, 
however, resembles the problem of finding the channel 
capacity, for which is well known that no general proce- 
dure exists (Cover and Thomas 1991). Thus, how au- 



'-D 



0.469 



0.156 



bits. 



(74) tonomous systems deal with such a huge mathematical 
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problem? One may consider the possibility that the co- 
evolution of the abstract coding and decoding entities; 
this would avoid the system to face a great amount of 
configurations per generation, thereby being all options 
highly limited at each generation where selection is at 
work. 

We finally emphasize that the unavoidable dissipa- 
tion of mutual information points to a reinterpretation 
of information-transfer phenomena in biological or self- 
organized systems, due to the important consequences 
that can be derived from it. Further work should ex- 
plore the relevance of this limitation on more realistic 
scenarios, together with other implications that can be 
derived by placing equation (36) at the center of infor- 
mation transfer in biology. 
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